A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations
Identifieur interne : 001129 ( Main/Exploration ); précédent : 001128; suivant : 001130A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations
Auteurs : Mona Omidyeganeh [Iran] ; Reza Azmi [Iran] ; Kambiz Nayebi [Iran, États-Unis] ; Abbas Javadtalab [Iran, États-Unis]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2006.
Abstract
Abstract: A new segmentation algorithm for multifont Farsi/Arabic texts based on conditional labeling of up and down contours was presented in [1]. A preprocessing technique was used to adjust the local base line for each subword. Adaptive base line, up and down contours and their curvatures were used to improve the segmentation results. The algorithm segments 97% of 22236 characters in 18 fonts correctly. However, finding the best way to receive high performance in the multifont case is challengeable. Different characteristics of each font are the reason. Here we propose an idea to consider some extra classes in the recognition stage. The extra classes will be some parts of characters or the combination of 2 or more characters causing most of errors in segmentation stage. These extra classes will be determined statistically. We have used a learn document of 4820 characters for 4 fonts. Segmentation result improves from 96.7% to 99.64%.
Url:
DOI: 10.1007/978-3-540-69423-6_65
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001B02
- to stream Istex, to step Curation: 001992
- to stream Istex, to step Checkpoint: 000B02
- to stream Main, to step Merge: 001146
- to stream Main, to step Curation: 001129
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations</title>
<author><name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
</author>
<author><name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
</author>
<author><name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
</author>
<author><name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/978-3-540-69423-6_65</idno>
<idno type="url">https://api.istex.fr/document/F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001B02</idno>
<idno type="wicri:Area/Istex/Curation">001992</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B02</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Omidyeganeh M:a:new:method</idno>
<idno type="wicri:Area/Main/Merge">001146</idno>
<idno type="wicri:Area/Main/Curation">001129</idno>
<idno type="wicri:Area/Main/Exploration">001129</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations</title>
<author><name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Iran Telecommunication Research Center (ITRC), Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Computer Dep., Azzahra University, Vanak, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Eng. Dep., Sharif University, Teharan</wicri:regionArea>
<wicri:noRegion>Teharan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Computer Eng. Dep., Sharif University, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D</idno>
<idno type="DOI">10.1007/978-3-540-69423-6_65</idno>
<idno type="ChapterID">65</idno>
<idno type="ChapterID">Chap65</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: A new segmentation algorithm for multifont Farsi/Arabic texts based on conditional labeling of up and down contours was presented in [1]. A preprocessing technique was used to adjust the local base line for each subword. Adaptive base line, up and down contours and their curvatures were used to improve the segmentation results. The algorithm segments 97% of 22236 characters in 18 fonts correctly. However, finding the best way to receive high performance in the multifont case is challengeable. Different characteristics of each font are the reason. Here we propose an idea to consider some extra classes in the recognition stage. The extra classes will be some parts of characters or the combination of 2 or more characters causing most of errors in segmentation stage. These extra classes will be determined statistically. We have used a learn document of 4820 characters for 4 fonts. Segmentation result improves from 96.7% to 99.64%.</div>
</front>
</TEI>
<affiliations><list><country><li>Iran</li>
<li>États-Unis</li>
</country>
</list>
<tree><country name="Iran"><noRegion><name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
</noRegion>
<name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
<name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
<name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
<name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
<name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
</country>
<country name="États-Unis"><noRegion><name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
</noRegion>
<name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001129 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001129 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D |texte= A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations }}
This area was generated with Dilib version V0.6.32. |